11 research outputs found
Temporal error concealment for fisheye video sequences based on equisolid re-projection
Wide-angle video sequences obtained by fisheye cameras exhibit
characteristics that may not very well comply with standard image and video
processing techniques such as error concealment. This paper introduces a
temporal error concealment technique designed for the inherent characteristics
of equisolid fisheye video sequences by applying a re-projection into the
equisolid domain after conducting part of the error concealment in the
perspective domain. Combining this technique with conventional decoder motion
vector estimation achieves average gains of 0.71 dB compared against pure
decoder motion vector estimation for the test sequences used. Maximum gains
amount to up to 2.04 dB for selected frames
Learning to Predict Image-based Rendering Artifacts with Respect to a Hidden Reference Image
Image metrics predict the perceived per-pixel difference between a reference
image and its degraded (e. g., re-rendered) version. In several important
applications, the reference image is not available and image metrics cannot be
applied. We devise a neural network architecture and training procedure that
allows predicting the MSE, SSIM or VGG16 image difference from the distorted
image alone while the reference is not observed. This is enabled by two
insights: The first is to inject sufficiently many un-distorted natural image
patches, which can be found in arbitrary amounts and are known to have no
perceivable difference to themselves. This avoids false positives. The second
is to balance the learning, where it is carefully made sure that all image
errors are equally likely, avoiding false negatives. Surprisingly, we observe,
that the resulting no-reference metric, subjectively, can even perform better
than the reference-based one, as it had to become robust against
mis-alignments. We evaluate the effectiveness of our approach in an image-based
rendering context, both quantitatively and qualitatively. Finally, we
demonstrate two applications which reduce light field capture time and provide
guidance for interactive depth adjustment.Comment: 13 pages, 11 figure
Pixel-Wise Confidences for Stereo Disparities Using Recurrent Neural Networks
One of the inherent problems with stereo disparity estimation algorithms is the lack of reliability information for the computed disparities. As a consequence, errors from the initial disparity maps are propagated to the following processing steps such as view rendering. Nowadays, confidence measures belong to the most popular techniques because of their capability to detect disparity outliers. Recently, convolutional neural network based confidence measures achieved best results by directly processing initial disparity maps. In contrast to existing convolutional neural network based methods, we propose a novel recurrent neural network architecture to compute confidences for different stereo matching algorithms. To maintain a low complexity the confidence for a given pixel is purely computed from its associated matching costs without considering any additional neighbouring pixels. As compared to the state-of-the-art confidence prediction methods leveraging convolutional neural networks, the proposed network is simpler and smaller in terms of size (reduction of the number of trainable parameters by almost 3-4 orders of magnitude). Moreover, the experimental results on three well-known datasets as well as with two popular stereo algorithms clearly highlight that the proposed approach outperforms state-of-the-art confidence estimation techniques
Non-planar inside-out dense light-field dataset and reconstruction pipeline
Light-field imaging provides full spatio-angular information of the real world by capturing the light rays in various directions. This allows image processing algorithms to result in immersive user experiences such as VR. To evaluate, and develop reconstruction algorithms, a precise and dense light-field dataset of the real world that can be used as ground truth is desirable. Fraunhofer IIS presents a dataset that includes two scenes that are captured by an accurate industrial robot with an attached color camera such that the camera is looking outward. The arm moves on a cylindrical path for a field of view of 125 degrees with angular step size of 0,01 degrees. The images are pre-processed in different steps. The disparity between two adjacent views with resolution of 5168x3448 is less than 1,6 pixels; the parallax between the foreground and the background objects is less than 0,6 pixels. The dataset is based on the paper "Non-planar inside-out dense light-field dataset and reconstruction pipeline" by Faezeh Sadat Zakeri, Ahmed Durmush, Matthias Ziegler, Michel Bätz, and Joachim Keinert.Light-field imaging provides full spatio-angular information of the real world by capturing the light rays in various directions. This allows image processing algorithms to result in immersive user experiences such as VR. To evaluate, and develop reconstruction algorithms, a precise and dense light-field dataset of the real world that can be used as ground truth is desirable. Fraunhofer IIS presents a dataset that includes two scenes that are captured by an accurate industrial robot with an attached color camera such that the camera is looking outward. The arm moves on a cylindrical path for a field of view of 125 degrees with angular step size of 0,01 degrees. The images are pre-processed in different steps. The disparity between two adjacent views with resolution of 5168x3448 is less than 1,6 pixels; the parallax between the foreground and the background objects is less than 0,6 pixels. The dataset is based on the paper "Non-planar inside-out dense light-field dataset and reconstruction pipeline" by Faezeh Sadat Zakeri, Ahmed Durmush, Matthias Ziegler, Michel Bätz, and Joachim Keinert
TEDDY: A High-Resolution High Dynamic Range Light-Field Dataset
Light-field (LF) imaging has various advantages over the traditional 2D photography, providing angular information of the real world scene by separately recording light rays in different directions. Despite the directional light information which enables new capabilities such as depth estimation, post-capture refocusing, and 3D modelling, currently available light-field datasets are very restricted in terms of spatialresolution and dynamic range. We address this problem by capturing a novel light-field dataset featuring both a high spatial resolution and a high dynamic range (HDR). This dataset should enable the community to research and develop efficient reconstruction and tone-mapping algorithms for a hyper-realistic visual experience. The dataset consists of six static light-fields that are captured by a high-quality digital camera mounted on two precise linear axes using exposure bracketing at each view point.Light-field (LF) imaging has various advantages over the traditional 2D photography, providing angular information of the real world scene by separately recording light rays in different directions. Despite the directional light information which enables new capabilities such as depth estimation, post-capture refocusing, and 3D modelling, currently available light-field datasets are very restricted in terms of spatial-resolution and dynamic range. In this work, we address this problem by capturing a novel light-field dataset featuring both a high spatial resolution and a high dynamic range (HDR). This dataset should enable the community to research and develop efficient reconstruction and tone-mapping algorithms for a hyper-realistic visual experience. The dataset consists of four static light-fields that are captured by a high-quality digital camera mounted on two precise linear axes using exposure bracketing at each view point.See Readme.txt
A Novel Confidence Measure for Disparity Maps by Pixel-Wise Cost Function Analysis
Disparity estimation algorithms mostly lack information about the reliability of the disparities. Therefore, errors in initial disparity maps are propagated in consecutive processing steps. This is in particularly problematic for difficult scene elements, e.g., periodic structures. Consequently, we introduce a simple, yet novel confidence measure that filters out wrongly computed disparities, resulting in improved final disparity maps. To demonstrate the benefit of this approach, we compare our method with existing state-of-the-art confidence measures and show that we improve the ability to detect false disparities by 54.2%
Light-field view synthesis using convolutional block attention module
Consumer light-field (LF) cameras suffer from a low or limited resolution
because of the angular-spatial trade-off. To alleviate this drawback, we
propose a novel learning-based approach utilizing attention mechanism to
synthesize novel views of a light-field image using a sparse set of input views
(i.e., 4 corner views) from a camera array. In the proposed method, we divide
the process into three stages, stereo-feature extraction, disparity estimation,
and final image refinement. We use three sequential convolutional neural
networks for each stage. A residual convolutional block attention module (CBAM)
is employed for final adaptive image refinement. Attention modules are helpful
in learning and focusing more on the important features of the image and are
thus sequentially applied in the channel and spatial dimensions. Experimental
results show the robustness of the proposed method. Our proposed network
outperforms the state-of-the-art learning-based light-field view synthesis
methods on two challenging real-world datasets by 0.5 dB on average.
Furthermore, we provide an ablation study to substantiate our findings